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(N : ABSTRACT 
O 

i/-^ ' Neural network models offer a theoretical testbed for the study of learning at 

^ \ the cellular level. The only experimentally verified learning rule, Hebb's rule, is 
^ ! extremely limited in its ability to train networks to perform complex tasks. An 
identified cellular mechanism responsible for Hebbian-type long-term potentiation, 
the NMDA receptor, is highly versatile. Its function and efficacy are modulated by a 



c3 ! wide variety of compounds and conditions and are likely to be directed by non-local 
. phenomena. Furthermore, it has been demonstrated that NMDA receptors are not 
essential for some types of learning. We have shown that another neural network 
^ ' learning rule, the chemotaxis algorithm, is theoretically much more powerful than 
\ Hebb's rule and is consistent with experimental data. A biased random-walk in 
synaptic weight space is a learning rule immanent in nervous activity and may 
account for some types of learning - notably the acquisition of skilled movement. 

KEY WORDS: biological neural networks, random walk, chemotaxis, stochas- 
tic optimization, biological plausibility. 

INTRODUCTION 



In their landmark paper, "A Logical Calculus of the Ideas Immanent in Nervous 
Activity", McCuUoch and Pitts [1943] demonstrated how a network of extremely 
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simplified ("all-or-nothing") neurons could compute any Boolean function. Matfie- 
matical analyses of modern neural network models have since revealed them to be 
potentially universal coiaputing devices [Seigelman and Sontag 1991]. 

Neural network modeling has not only been helpful in understanding the collec- 
tive behavior of existing networks, but also provides a theoretical framework with 
which one can experiment with models of learning. Rosenblatt [1958] demonstrated 
that these networks, when endowed with modifiable connections ( "perceptrons" ) , 
could be "trained" to classify patterns (see also Arbib [1964; 1987]). Thus, Rosen- 
blatt had developed a theoretical testbed for the study of learning (formerly the 
near-exclusive domain of psychology) at the cellular level. 

Theoretical neural network studies (mathematical analyses and empirical com- 
puter simulations) are useful for exploring the capabilities and limitations of a pro- 
posed learning rule. The only experimentally verified learning rule, Hebb's rule, 
has profound limitations in this respect. Engineering optimization algorithms (such 
as back-propagation or genetic algorithms) are capable of training neural networks 
to perform much more sophisticated tasks, but are biologically implausible [Crick 
1989a,b; Mel 1990; Anderson 1991]. 

Long underestimated by both the experimental and theoretical neural network 
communities is perhaps the most intuitive mode of learning - trial-and-error. We 
have shown [Bremermann and Anderson 1989,1991] that the mathematical analog to 
trial-and-error, a Gaussian biased random-walk in synaptic weight space, is capable 
of training neural networks to perform the same complex, nonlinear mappings as 
backpropagation. 

In this paper, we review theoretical and empirical neural network studies of 
random-walk learning which demonstrate the effectiveness of this learning rule. We 
argue the biological plausibility of a trial-and-error learning rule, though a discussion 
of existing neurobiological data and identified molecular mechanisms. Finally, we 
identify the directions of experimental research most likely to identify its necessary 
elements. 



2. HEBB'S RULE 
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In 1949, Hebb proposed a neuronal learning rule which could integrate asso- 
ciative memories into neural networks [Hebb 1949]. Hebb postulated that when 
one neuron repeatedly excites another, the synaptic knobs are strengthened. Un- 
doubtably, the emerging dominance of behaviorism in many fields lent Hebb's rule 
a certain amount of intellectual support. Hebb's rule is also appealing from a ge- 
netic point of view, since it requires very little genetic "overhead" to implement 
in actual nervous systems. All that is required is a mechanism for distinguishing 
simultaneous stimuli at the cellular level. 

Verification has taken time, but there is now evidence that Hebbian-type long 
term potentiation (LTP) (with some modifications of the original hypothesis) does 
indeed occur [Lynch 1986; Kennedy 1988; Stevens 1989]. Long-term depression 
(LTD) has been observed in the same system supporting an ancillary "Hebbian 
covariance learning rule" [Stanton and Sejnowski 1989]. 

2.1 Experimental Evidence: The NMDA Receptor 

Long-term potentiation is mediated by the N-methyl-D-aspartate (NMDA) re- 
ceptor. It is useful to review the mechanisms current model of LTP for two reasons. 
First, it illustrates how the proposed (Hebbian) learning rule infiuenced experimen- 
tal efforts. Secondly, the actual mechanisms discovered are subtly different from 
the Hebbian ideal of strengthening correlated inputs. 

According to the current model of LTP [Zalutsky and NicoU 1990; Buonomano 
and Bryne 1990; Kandel and O'Dell 1992], for the NMDA receptor channel to open, 
two conditions must be met simultaneously: (i) the receptor must bind glutamate, 
and (ii) the postsynaptic cell must be depolarized through activation of non-NMDA 
receptors. At resting potential, the NMDA receptor channel is blocked by Mg^+. 
Depolarization removes the voltage-dependent Mg^+ block, allowing Ca^"'' to flow 
into the cell. Ca^"'' appears to trigger LTP, through the activation of at least three 
different protein kinases (see Fig. 1). 

There is also evidence for chemical and/or structural presynaptic changes [Za- 
lutsky and NicoU 1990; Edwards 1991]. Presynaptic modification is thought to be 
effected via retrograde messengers released across the synaptosomal junction. The 
retrograde messenger is presumed to be a labile, diffusible substance synthesized 
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and released by the postsynaptic cell. The synthesis and/or release of such messen- 
gers is likely to be a calcium-dependent process as well. Several substances have 
been postulated to function as retrograde messengers. Among them are nitric oxide 
[Gaily et al. 1990], hydrogen peroxide [Colton et al. 1989; Zoccarato et al. 1989] 
and archidoinic acid [Williams et al. 1989]. (For a review, see Montague et al. 
[1991].) 

Many other substances have been shown to have modulatory effects on LTP. A 
partial list of proteins, hormones, neurotransmitters and other compounds includes 
glycine and D-serine [Salt 1989], serotonin [Ropert and Guy 1991], acetylcholine 
and noradrenaline [Bear and Singer 1986; Brocher et al. 1992], human epidermal 
growth factor [Abe and Saito 1992], antidepressant drugs [Birnstiel and Haas 1991], 
milacemide [Quartermain et al. 1991], opioids [Xie and Lewis 1991] and ethanol 
[lorio et al. 1992]. Thus, it is not surprising that mental states and other factors 
such as "attention" , blood flow, "excitement" , etc. can influence learning. That so 
many compounds can modulate LTP indicates that the NMDA receptor may be a 
much more universal tool for synaptic modification, and not only employed in local, 
Hebbian-type learning. 

Finally, NMDA clearly mediates some, but not all, forms of learning. For 
instance, Malenfant et al. [1991] showed that application of an NMDA receptor 
antagonist (MK801) could block the acquisition of a spatial maze task in a dose- 
dependent manner. However, MK801 did not block the acquisition of experience- 
based maternal behavior. The same maternal experience effects can be blocked by 
chemical inhibition of protein synthesis. 

In summary, the NMDA receptor requires coincident events and makes possi- 
ble a type of associative learning. Its discovery required intricate experiments at 
synaptic junctions. It is currently unclear whether synaptic change occurs at the 
postsynaptic dendritic spine, the presynaptic glutamate axon terminal, the presy- 
naptic depolarizing axon, the axonal processes themselves, or a combination of all 
of these structures. Several chemical compounds have been identified which can 
facilitate or inhibit LTP. Many compounds which modulate LTP are common phys- 
iological chemical compounds, proteins or neurotransmitters and, as such, do not 
necessarily originate from either the pre- or postsynaptic neuron(s). Thus, it is 
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conceivable that several forms of learning are operating in neural tissues, and these 
other forms of learning can be mediated via the NMDA receptor as well as by other, 
independent neural processes. 

2.2 Limitations of Hebbian Learning 

Theoretically, Hebbian learning can account for some types of biological learn- 
ing. Hebbian mechanisms have been shown to be sufficient to account for topo- 
graphic mappings [Kohonen 1984; Grajski and Merzenich 1990], plasticity in corti- 
cal representation [Merzenich et al. 1987; Montague et al. 1991] and, when applied 
to "sigma-pi" neurons, some nonlinear pattern recognition tasks [Mel 1992]. But 
there is more to the brain than conditioned reflexes and associative memories. For 
anything but special cases, Hebb's rule is insufficient as a learning rule [Rosenblatt 
1962; Rumelhart and McClelland 1986]. 

Since Hebbian learning requires near simultaneous or synchronous stimuli, it 
is limited temporally. In many biological situations, instantaneous performance 
results are not available. Motor control tasks, for example, are inherently sequential. 
Temporal delays are also involved in many phenomena observed in psychophysical 
and electrophysiological studies of classical conditioning, such as anticipation of an 
unconditioned stimulus [Chester 1990; Deno 1991]. Hebbian learning would have to 
be combined with additional memory mechanisms or neuronal structures to account 
for such phenomena. Recent attempts to expand Hebbian learning rules to include 
short-term memory [Sutton and Barto 1983, Klopf 1989, Grossberg and Schmajuk 
1989] have met with limited success [Chester 1990]. 

To account for more complex phenomenon, such as skilled movement, many 
have postulated the brain utilizes "model-reference control" , that is, the brain de- 
velops an internal model of the musculature and environment to predict perfor- 
mance of a control signal. A Hebbian mechanism can then be used to control such 
a system, since presumably, the temporal delay has been removed from correlated 
events. Such a system may in fact be used, especially for rapid, open-loop eye and 
hand movements [Grossman 1983; Anderson and Vemuri 1992]. But the "model" 
must still be updated by a global supervisory signal which takes its cues from the 
external environment. 
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Since the Hebbian rule applies only to correlations at the synaptic level, it is 
also limited locally. Strengthening a local correlation in the context of a nonlinear 
mapping of several variables (such as the N-bit parity problem) often reduces overall 
performance. Consequently, Hebbian learning is unable to reliably train a multilayer 
perceptron network to learn arbitrary, nonlinear decision boundaries [Rumelhart 
and McClelland 1986]. 

3. THEORETICAL LEARNING RULES 

We have seen how influential a simple theoretical concept, Hebb's rule, has been 
in neurobiology. Current artificial neural network (ANN) research has provided 
valuable insights into the collective behavior of small networks of neurons [Hopfield 
1984; Lehky and Sejnowski 1988, 1990; Lockery et al. 1989]. However, most of these 
results were obtained using more sophisticated algorithms than Hebb's rule. Do 
any of the multitude of ANN learning rules have any implications for experimental 
neurobiology? 

Learning rules employed to train ANN's are more appropriately referred to as 
optimization procedures. These algorithms, most of which are based on minimiza- 
tion of a defined error function, are capable of overcoming the limitations of Hebb's 
rule. Among the most popular today are genetic algorithms [Montana and Davis 
1989; Austin 1990] and gradient-descent learning [Rumelhart et al. 1986]. (For an 
overview of "connectionist" learning rules, see Hinton [1989].) Most of these algo- 
rithms have little biological basis and are used primarily for engineering problems 
in pattern recognition, classification, signal reconstruction, and so on. 

Criticisms of the biological plausibility of ANN training algorithms are abun- 
dant in the literature. In his article "The recent excitement about neural networks" , 
Francis Crick [1989a] writes: 

"It is hardly surprising that such achievements [referring to the successes us- 
ing back-propagation] have produced a heady sense of euphoria. But is this what 
the brain actually does? Alas, the back- prop nets are unrealistic in almost every 
respect.... Obviously what is really required is a brain-like algorithm which produces 
results of the same general character as back propagation^^ [emphasis added] . 

Bartlett Mel [1990] poses the problem this way: 
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"[I]s it... a fundamental law that neural associative learning algorithms must be 
either represent ationally impoverished or mechanistically overcomplex?" 

What are the necessary features of a biologically plausible learning rule? First, 
it must have a mechanism for synaptic modification that is consistent with exper- 
imental data. Secondly, a learning rule must not involve so much specific neural 
structure that an excessive number of genes are required for its coding. Lastly, 
to be of any use to biologists, it must be observable. Clearly, Hebb's rule satisfies 
these criteria, while back-propagation violates all three. As the title of this paper 
suggests, there is at least one other ANN learning rule that satisfies these criteria - 
a biased random-walk [Bremermann and Anderson 1989, 1991]. 

3.1 Learning via Random- Walks 

In its most basic form, a random-walk can be generated by spontaneous, ran- 
dom variation in synaptic strength. This way, the mechanism for synaptic change 
is local and independent of any higher-level teaching signals. Successful changes in 
architecture or synaptic strength are rewarded or punished after the fact. Such a 
biased random- walk in synaptic weight space can be considered a cellular analog of 
trial-and-error. 

The first attempt to apply such an algorithm to neural networks was by Lewey 
Gilstrap, Jr., Cook and Armstrong at Adaptronics, Inc. (McLean, VA) around 1970 
They called their algorithm "accelerated, guided random search" (GARS): 

"[T]he accelerated random search begins by exploring the vicinity of its initial 
estimate. The random trials are governed by a normal distribution of probabilities 
which is centered on the initial point. ... the accelerated random search follows an 
unsuccessful random step, with a step of equal magnitude in the opposite direction. 
By this means, a successful step is usually achieved on the second trial if not on the 
first random trial. ... A successful step is always followed by another step in the 
same direction ... each successive step is given double the magnitude of the prior 
step." [Barron 1968] 

Barron [1968; 1970] used GARS to optimize control parameters in flight control 
systems. Mucciardi [1972] applied GARS to neural net-like classification structures 
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called "ncuromine nets" . Mucciardi's paper presented an analysis of neuromine nets 
and the algorithm, but provided only simple examples of its application. Interest in 
neural networks was waning at that time, especially because of well-known limita- 
tions of simple perceptrons acknowledged by Rosenblatt [1962] and highlighted in 
Perceptrons [Minsky and Papert 1969]. Unfortunately, Mucciardi and his colleagues 
never applied their algorithm to the complex classification problems emphasized in 
Perceptrons - the exclusive OR and "connectedness" problems. Another aspect of 
random search, overlooked by the group at Adaptronics, was its potential relevance 
to biology. 

In 1988, we began experimenting with a similar algorithm, which we dubbed the 
"chemotaxis algorithm" [Bremermann and Anderson 1989, 1991; see also Appendix], 
by analogy to the strategy employed by bacteria to find chemoattractants in a spatial 
concentration gradient [Alt 1980; Koshland 1980; Berg 1983]. We showed that a 
biased Gaussian random- walk could, in fact, train neural networks to solve the same 
difficult Boolean mappings that had eluded single layer perceptrons and Hebbian 
networks (exclusive OR, N-bit parity, etc.). 

Random-walk learning has not received much attention for several reasons: 
Criticism #1: Random walks were known to get trapped in local minima in 
conventional optimization problems. 

In the case of neural networks, local minima is not as much of a problem as one 
might expect. What is a local minimum in a small network with a lower dimensional 
weight space, often becomes a multi-dimensional saddle point in higher dimensions 
[Baldi and Hornik 1989; Conrad and Ebehng 1992; Yu 1992]. This is because of 
the degeneracy inherent in neural network architectures: there are usually a much 
larger number of free parameters (weights) than are theoretically required to solve 
the task at hand. 

Evolutionary optimization is also easier in high-dimensional, redundant systems 
[Conrad 1983]. A biased random-walk can be considered a rudimentary genetic 
algorithm - where the environment selects one of two possible mutant structures at 
each step. Conrad and Ebeling [1992] have shown that saddle points, not isolated 
peaks, dominate high dimensional fitness landscapes: "Increasing the dimensionality 
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of a system... increases the chances of finding an uphill [favorable] pathway to still 
higher peaks." Conrad refers to this phenomenon as "extradimensional bypass". 

Criticism #2: A random walk was thought to be inefficient. 

A biased random walk is also a form of gradient descent (random descent), and 
is quite eflBcient. In the case of a 3-dimensional spherical gradient (a condition that 
is ideal for gradient descent), the path taken to reach the optimum by the chemotaxis 
algorithm is, on average, only 39% longer than the optimal, direct gradient path 
[Bremermann 1974]. Empirical studies show that the chemotaxis algorithm, while 
usually slower to converge, compares favorably in final network performance with 
back-propagation on a variety of benchmark tasks [Bremermann and Anderson 1989; 
Wilson 1991]. Furthermore, in cases where local minima do exist, there is no reason 
to expect it is more prone to local minima than back-propagation [Anderson 1991; 
Baldi 1992]. An extensive analytical comparison of random descent and gradient 
descent learning is given by Baldi [1992]. 

Criticism #3: Neural network researchers generally did not believe a random 
walk could train neural networks to solve complex, nonlinear mappings such as the 
exclusive OR. 

The perceived problem of local minima reinforced this belief. This belief, how- 
ever, turned out simply to be unfounded [Bremermann and Anderson 1989] (Table 
I). In addition to the benchmark problems, the chemotaxis algorithm has been ap- 
plied successfully to training neural networks to solve a variety of problems: discrim- 
ination of seismic signals [Dowla et al. 1990; Anderson 1991], training "recurrent" 
neural networks [Anderson 1991], process control [Willis et al. 1991a,b], and mo- 
tor control [Anderson and Vemuri 1992; Styer and Vemuri 1992a,b]. Experiments 
with other stochastic training algorithms have had similar successes [Harth and 
Tzanakou 1974; Tzanakou et al. 1979; Harth et al. 1988; Smalz and Conrad 1991; 
Jabri and Flower 1992]. 

Criticism 7^4: "Reinforcement" learning models had not been presented in a 
distilled, biologically plausible way. 

Reinforcement signals are generally thought to carry only general information 
about the overall performance ( "good" , "better" , "target was missed by x amount" , 
etc.). Specific information to individual synapses as to their relative responsibility 
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in the task would be very difficult to determine. Biological mechanisms for assigning 
responsibility to each individual synapse is highly unlikely [Crick 1989a]. 

Most proposed reinforcement learning rules are "mechanistically overcomplex" . 
In Barto and Sutton's reinforcement learning schemes, for example, synaptic change 
is generated by the reinforcement signal itself, as interpreted by an adaptive critic 
element [Barto et al. 1981; Barto and Sutton 1983]. Although this work has gen- 
erated many interesting and non-trivial applications, the complexity of its synaptic 
adjustment rules makes it an unlikely candidate for a biological learning rule. Other 
reinforcement algorithms have similar drawbacks [Williams 1992]. Surprisingly, in 
a comparison between adaptive critic and chemotaxis in controlling a cart-pole sys- 
tem, chemotaxis performed as well or better than the more complicated (and less 
biological) adaptive critic networks [Styer and Vemuri 1991a,b]. 

Criticism ^^5: Experimentalists are limited by what is observable. 

The final, and most important obstacle to finding biological evidence for rein- 
forcement learning has been, and continues to be experimental observability. This 
is because random walks are a non-local phenomenon. Experimental protocols in- 
volving single neurons, synapses, or even a small collection of interacting neurons 
cannot directly verify a non-local learning rule. Local measurements of a global 
phenomenon can only verify two of the necessary elements, local synaptic variation 
and neuromodulation (facilitation or inhibition of synaptic change). We devote the 
majority of the remainder of this article to addressing this problem. 

4. BIOLOGICAL EVIDENCE 

Reinforcement learning requires three components: (i) a mechanism for the gen- 
eration of synaptic change, (ii) a structure for evaluating performance, or "trainer" , 
and (iii) a reinforcement signal. To build a case for biological plausibility, we must 
show that all of the necessary elements are consistent with biological observations. 

Two components required for random- walk learning arc clearly consistent with 
biological observations - random synaptic variation and neural structures for eval- 
uating performance. Indeed, it is generally believed that local random explorations 
account for some types of neural development [Montague et al. 1991]. In develop- 
mental models, however, the reinforcement signal is provided by the target cell. The 



Anderson, Random- Walks 



page 11 



random walk ends when a process finds its target. This type of locally reinforced 
random-walk has the same limitations as Hebbian learning. The difference with 
what is being proposed here is that the reinforcement signals are not generated 
locally, through retrograde messengers or cell-adhesion molecules. Instead, rein- 
forcement is generated and broadcast from "supervisory" neural structures (Figure 
2). 

4.1 Random Structural Variation 

Cellular events are dominated by stochastic processes. It is highly probable 
that the organism makes use of this fact in the process of learning. It has been 
shown that structural variation can be guided or influenced by chemical or neural 
signals. What remains to be found is if this modulation is a local phenomenon 
or mediated by higher centers. Here, we cite just two examples of experimental 
systems which are consistent with this view. 

Growth of neurites in cerebellar granule cell cultures progresses stochasti- 
cally [Rashid and Cambray-Deakin 1992]. Stimulation with NMDA results in a 
marked increase in growth rate, while the addition of an NMDA receptor antag- 
onist, aminophosphonovalerate (APV), causes a marked retraction of pre-existing 
processes. Either of these effects could be directed from more distant neural struc- 
tures. 

In another experiment, Glanzman et al. [1990] studied an in vitro coculture 
of Aplysia sensory neurons and their target (L7 motor) cells. The sensorimotor 
cocultures were grown for 5 days and observed by fluorescence video micrographs. 
One group of preparations was repeatedly treated with the facilitating transmitter 
serotonin (5-HT) for 24 hours. At the end of the experiment, the coculture was 
imaged again to look for structural changes. Morphological changes (changes in 
the size of varicosities or new processes) at the junctions between the sensory and 
motor cells were rated on a subjective scale. This study was significant in that they 
were able to directly image structural changes - rather than relying on comparisons 
between two different populations of neurons. In the control group, morphological 
changes were found to be normally distributed with a mean change of zero on 
their rating scale. In the cocultures treated with serotonin, however, structural 
change was shown to be highly biased toward increases in varicosities or processes. 
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Furthermore, they showed that these structural changes corresponded to measurable 
changes in monosynaptic excitatory post-synaptic potential (EPSP) produced in L7 
motor cells by firing the sensory neuron. Thus, they were able to show both physical 
and electrophysiological facilitation can be induced in vitro by a single chemical 
signal - serotonin. 

We suggest that these random variations serve a vital role in learning, that 
is, generating new trial connections and efficacies. Serotonin release in a cluster of 
neurons may serve as a local "print" (or fixing) signal to retain effective changes. 
However, the experiment described by Glanzman et al. cannot differentiate between 
serotonin's putative role as a simple growth factor or a reinforcement signal. 

Serotonin has been shown to serve a role as a neuromodulator as well as a 
facilitation signal. There is evidence for a brainstem serotonergic projection to the 
ventrobasal thalamus, thus linking facilitory signal to higher brain centers [Eaton 
and Salt 1989]. Does facilitation reinforce existing changes, or does the change 
occur as a result of the presence of serotonin? 

4.2 Reinforcement Signals 

A biased random-walk requires that the performance of a net be evaluated. 
This evaluation could be accomplished by other brain circuits. We do not consider 
this requirement problematic, since evaluation of performance tends to be compu- 
tationally easier than improvement. For example, throwing a ball requires precise 
coordination and timing of numerous muscles. Good performance is hard to achieve 
and may require extensive training. But, how close a ball comes to hitting the target 
is relatively easy to determine. Evaluation of accuracy can be processed separately 
by the visual cortex - independent of networks involved in generating the movement. 
One portion of the brain thus could act for another system as "supervisor" . 

The reinforcement signal is likely to carry only general, non-specific, infor- 
mation. Thus, it could be neural or chemical (hormonal) in origin. Many of the 
substances which have been shown to modulate LTP (including the candidate ret- 
rograde messengers) are candidate reinforcement signals as well. To complete a 
model of random-walk learning, one must demonstrate that other brain centers 
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have projections to the sites of synaptic variation which release (directly or indi- 
rectly) substances which can act to facilitate or inhibit the process of structural 
change. 

One known reverse pathway is a projection from the locus coeruleus to the 
olfactory bulb. Locus coeruleus neurons are known to have norepinephrine (NE) as 
a neurotransmitter. Grayetal. [1986] demonstrated that intrabulbar infusion of NE 
into the rabbit olfactory bulb can prevent or delay the habituation to unreinforced 
odors. Locus coeruleus neurons are known to be activated by unconditioned stimuli 
[Aghajanian and Vandermaelen 1982], and several forms of use-dependent synaptic 
plasticity in cortical tissues require the presence of NE [Bliss et al. 1983; Bear 
and Singer 1986]. These signals from the locus coeruleus are diffuse but may still 
serve a neuromodulatory role [Crick 1989a]. Taken together, these data suggest 
that norephinephrine could be functioning as a reinforcement signal. 

5. CONCLUSIONS 

It is self-evident that some form of trial-and-error learning is involved in the 
acquisition of skilled movement [Grossman 1959; Anderson 1981]. But training 
a tabula rasa of randomly connected masses of neurons to perform complex con- 
trol tasks is evidently a hopeless endeavour [Anderson 1991]. High level control of 
movement is thought to involve the coordination or modulation of existing Central 
Pattern Generators (CPG's) [Selverston 1980]. A biased random-walk can be used 
to optimize crudely organized network of CPG's during the acquisition of skilled 
movement [Anderson 1991; Anderson and Vemuri 1992; Styer and Vemuri 1992a,b]. 
This is somewhat analogous to Edelman's selectionist hypothesis in that learning 
entails the "selection" , or education of an existing repertoire of dynamical "groups" 
[Edelman 1987; Crick 1989b]. Furthermore, we point out that the chemotaxis algo- 
rithm is the most primitive form of trial-and-error; undoubtedly, more sophisticated, 
higher level neural mechanisms will have evolved to coordinate and compliment this 
process [Smalz and Conrad 1991]. 

Experimental verification of this type of learning will require protocols involving 
collections or assemblies of neurons, rather than individual synaptic junctions, to 
observe the stochastic variation and the effects of putative reinforcement signals. 
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Furthermore, a more ambitious effort must be made to link reinforcement signals 
backwards to their projective sources. 

McCuUoch and Pitts offered a solution to the embodiment problem by de- 
mostrating the computational properties of neural networks. Hebb proposed an 
neurbiological correlate to associative learning or classical conditioning. Biased 
random-walks in synaptic weight space can be seen as the neurobiological "embod- 
iment" of trial-and-error learning. A biased random walk may some day be shown 
to be "a learning rule immanent in nervous activitiJ\ 
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APPENDIX 

The Chemotaxis Algorithm 

The "chemotaxis training algorithm" consists of a biased random-walk in 
weight space. One advantage to this training method is that it does not require gra- 
dient calculations or detailed error signals. It also allows for automatic adjustment 
of the single learning parameter, which otherwise has to be found empirically. 

The network is initialized with an an arbitrary set of weights, w°, and per- 
formance E{w°) is evaluated. A random vector Aw is chosen from a multivariate 
Gaussian distribution with a zero mean and a unit standard deviation. This random 
vector is added to the current weights to create a "tentative" set of weights (w*): 

= w° + hAw 

where h is a stepsize parameter. Performance E{w^) is then calculated for the 
tentative weights. If the error of the new configuration is lower than the original 
configuration, the tentative changes in the weight vector are retained; otherwise, 
the system reverts to its original configuration. If a successful direction in weight 
space is found, weight modifications continue along the same random vector until 
progress ceases. A new random vector is then chosen, and the process is repeated. 
More details are available in the cited literature. 
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TABLE AND FIGURE CAPTIONS 

Table I: Training Time for the N-bit Parity Problem. 

N-bit parity can be considered a generalization of the 2-bit "exclusive OR" 
(XOR) problem since class membership of a given pattern is dependent on all N 
inputs. Network architecture was N-(2N+1)-1, where N represents the number of 
hidden units. The networks were trained on all 2^ possible binary input patterns. 
Training was continued until the network responses were within 10% of the ideal 
Boolean values. Chemotaxis averages are taken from Bremermann and Anderson 
[1989]. No attempt was made to optimize algorithm parameters. Backpropagation 
averages are taken from Tesauro and Janssens [1988], who used optimal values for 
the learning and momentum parameters. Note that the computational effort is 
double these values in the case of backpropagation. 

Figure 1: NMD A Implementation of Hebbian Learning. 

Simultaneous membrane depolarization and activation of the NMDA receptor 
allows calcium ions to flow into the cell. Calcium dependent proteins trigger a 
cascade of intracellular events leading to structural and/or chemical changes post- 
synaptically as well as potential presynaptic changes via retrograde messengers. 
(Adapted from [Montague et al. 1991; Kandel and O'DeU 1992].) 

Figure 2: Neural Implementation of a Biased Random- Walk 

Random variation in synaptic connectivity and efficacy is rewarded after the 
fact if performance has improved. Performance is evaluated by sensory systems 
(somatosensory, visual, auditory, etc.) and a non-specific, reinforcement signal is 
broadcast to the participating neural circuitry. The reinforcement signal could be 
chemical (hormonal) or neural in origin. 
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Table I 

Chemot£Lxis Algorithm Performance 



Dimension Chemotaxis Backpropagation 

(N) (epochs) (epochs) 



2(X0R) 113 25 

3 251 33 

4 962 75 

5 1259 130 

6 4169 310 

7 5789 800 
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